Computer Vision

Jacobs University Bremen

Fall 2021

Homework 2

This notebook includes both coding and written questions. Please hand in this notebook file with all the outputs and your answers to the written questions.

This assignment covers linear filters, convolution and correlation.

Part 1: Convolutions

1.1 Commutative Property (10 points)

Recall that the convolution of an image $f:\mathbb{R}^2\rightarrow \mathbb{R}$ and a kernel $h:\mathbb{R}^2\rightarrow\mathbb{R}$ is defined as follows: $$(f*h)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot h[m-i,n-j]$$

Or equivalently, \begin{align} (f*h)[m,n] &= \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty h[i,j]\cdot f[m-i,n-j]\\ &= (h*f)[m,n] \end{align}

Show that this is true (i.e. prove that the convolution operator is commutative: $f*h = h*f$).

Your Answer: Write your solution in this markdown cell. Please write your equations in LaTex equations.

$$(f*h)[m,n] = \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty h[i,j]\cdot f[m-i,n-j] = (h*f)[m,n]$$

substitute $$x=m-i,y=m-j$$ therefore $$i=x-m,j=y-n$$ We therefore end up with $$(f*h)[m,n] = \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty h[x-m,y-n]\cdot f[x,y] = (h*f)[m,n]$$ however, the summation indices still depend on $$i,j$$, so we look at the cases:

case $i=-\infty$ : x = $\infty$

case $i=\infty$ : x = $-\infty$ we do the same procedure for y, resulting in: $$ \sum_{x=\infty}^{-\infty}\sum_{j=\infty}^{-\infty} h[x-m,y-n]\cdot f[x,y]$$ Since adding all values from positive infinity to negative infinity is the same if done backwards, we can flip the infinity signs in the summation. Furthermore, since multiplication is commutative, we can also flip the order of the operands $$ \sum_{x=-\infty}^{\infty}\sum_{j=-\infty}^{\infty} \cdot f[x,y] h[x-m,y-n]$$ We therefore conclude that convolution is commutative

1.2 Implementation (30 points)

In this section, you will implement two versions of convolution:

First, run the code cell below to load the image to work with.

Now, implement the function conv_nested in filters.py. This is a naive implementation of convolution which uses 4 nested for-loops. It takes an image $f$ and a kernel $h$ as inputs and outputs the convolved image $(f*h)$ that has the same shape as the input image. This implementation should take a few seconds to run.

- Hint: It may be easier to implement $(hf)$*

We'll first test your conv_nested function on a simple input.

Now let's test your conv_nested function on a real image.

Let us implement a more efficient version of convolution using array operations in numpy. As shown in the lecture, a convolution can be considered as a sliding window that computes sum of the pixel values weighted by the flipped kernel. The faster version will i) zero-pad an image, ii) flip the kernel horizontally and vertically, and iii) compute weighted sum of the neighborhood at each pixel.

First, implement the function zero_pad in filters.py.

Next, complete the function conv_fast in filters.py using zero_pad. Run the code below to compare the outputs by the two implementations. conv_fast should run significantly faster than conv_nested.
Depending on your implementation and computer, conv_nested should take a few seconds and conv_fast should be around 5 times faster.


Part 2: Cross-correlation

Cross-correlation of two 2D signals $f$ and $g$ is defined as follows: $$(f\star{g})[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot g[i-m,j-n]$$

2.1 Template Matching with Cross-correlation (12 points)

Suppose that you are a clerk at a grocery store. One of your responsibilites is to check the shelves periodically and stock them up whenever there are sold-out items. You got tired of this laborious task and decided to build a computer vision system that keeps track of the items on the shelf.

Luckily, you have learned in the Computer Vision class at Jacobs University that cross-correlation can be used for template matching: a template $g$ is multiplied with regions of a larger image $f$ to measure how similar each region is to the template.

The template of a product (template.jpg) and the image of shelf (shelf.jpg) is provided. We will use cross-correlation to find the product in the shelf.

Implement cross_correlation function in filters.py and run the code below.

- Hint: you may use the conv_fast function you implemented in the previous question.

Interpretation

How does the output of cross-correlation filter look? Was it able to detect the product correctly? Explain what problems there might be with using a raw template as a filter.

Your Answer: Write your solution in this markdown cell. The cross correlation is incorrect, it was not able to detect the product correctly. Using a template as a raw filter can be problematic because:


2.2 Zero-mean cross-correlation (6 points)

A solution to this problem is to subtract the mean value of the template so that it has zero mean.

Implement zero_mean_cross_correlation function in filters.py and run the code below.

You can also determine whether the product is present with appropriate scaling and thresholding.


2.3 Normalized Cross-correlation (12 points)

One day the light near the shelf goes out and the product tracker starts to malfunction. The zero_mean_cross_correlation is not robust to change in lighting condition. The code below demonstrates this.

A solution is to normalize the pixels of the image and template at every step before comparing them. This is called normalized cross-correlation.

The mathematical definition for normalized cross-correlation of $f$ and template $g$ is: $$(f\star{g})[m,n]=\sum_{i,j} \frac{f[i,j]-\overline{f_{m,n}}}{\sigma_{f_{m,n}}} \cdot \frac{g[i-m,j-n]-\overline{g}}{\sigma_g}$$

where:

Implement normalized_cross_correlation function in filters.py and run the code below.

Part 3: Separable Filters

3.1 Theory (10 points)

Consider an $M_1\times{N_1}$ image $I$ and an $M_2\times{N_2}$ filter $F$. A filter $F$ is separable if it can be written as a product of two 1D filters: $F=F_1F_2$.

For example, $$F= \begin{bmatrix} 1 & -1 \\ 1 & -1 \end{bmatrix} $$ can be written as a matrix product of $$F_1= \begin{bmatrix} 1 \\ 1 \end{bmatrix}, F_2= \begin{bmatrix} 1 & -1 \end{bmatrix} $$ Therefore $F$ is a separable filter.

Prove that for any separable filter $F=F_1F_2$, $$I*F=(I*F_1)*F_2$$

Your Answer: Write your solution in this markdown cell. Please write your equations in LaTex equations. $$(I*F)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty F[i,j]\cdot I[m-i,n-j]$$ or, equivalently: $$(I*F)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty F[m-i,n-j]\cdot I[i,j]$$ since F can be decomposed into $$F=F_1F_2$$, we can write this equivalently as: $$(I*F)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty F_1[m-i]F_2[n-j]\cdot I[i,j]$$ since $F_1$ does not depend on j, we move it out of the summation $$(I*F)[m,n]=\sum_{i=-\infty}^\infty F_1[m-i] \sum_{j=-\infty}^\infty F_2[n-j]\cdot I[i,j]$$

looking at the second argument, we see that it is a convolution between F_2 and I, which we can extract from the equation

$$(I*F)[m,n]=\sum_{i=-\infty}^\infty F_1[m-i] \sum_{j=-\infty}^\infty F_2[n-j]\cdot I[i,j]$$$$F_2 * I [i,n] := \sum_{j=-\infty}^\infty F_2[n-j]\cdot I[i,j]$$

read this as: $F_2$, convolved with the 2-d image $I$, at row $i$

From this, we can conclude that we are effectively doing a convolution between $F_2$ and $I$, then convolving the result with $F_1$

$$(I*F)[m,n]= F_1 * ( F_2 * I)$$

since convolution is associative, this can be written as: $$I * F = F_1 * ( F_2 * I) = (F_1 * F_2) * I$$ $$ \square $$